Extensive Simulations for Longest Common SubsequencesFinite
نویسنده
چکیده
Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form E(LN)=N = S + AS=(ln N p N) + :::, where S and AS are constants depending on S, the alphabet size. We provide precise estimates of S for 2 S 15. We also study the related Bernoulli Matching model where the diierent entries of the \strings" are matched independently with probability 1=S. Let L B NM be the length of a longest sequence of matches in this case, for a given instance of size N M. On the basis of a cavity-like analysis we nd B S (r) = (2 p rS ? r ? 1)=(S ? 1), where B S (r) is the limit of E(L B NM)=N as N ! 1, the ratio r = M=N being xed. This formula agrees very well with our numerical computations of E(L B NM). It provides also a very good approximation for S(r), the corresponding function of the random string model, the approximation getting better as S increases. We nally study the \ground state" properties of this problem. We nd that the number NLCS of solutions typically grows exponentially with N. In other words, this system does not satisfy \Nernst's principle". This is also reeected at the level of the overlap between two LCSs chosen at random, which is found to be self averaging and to aproach a deenite value qS < 1 as N ! 1.
منابع مشابه
Extensive Simulations for Longest Common Subsequences
Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence LCS Problem asks for the longest sequence of non-contiguous matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form ELN=N = S + AS=ln N p N + :::, where S and AS are const...
متن کاملExtensive Simulations for Longest Common Subsequences Finite Size Scaling, a Cavity Solution, and Connguration Space Properties
Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence LCS Problem asks for the longest sequence of non-contiguous matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form ELN=N = S + AS=ln N p N + :::, where S and AS are const...
متن کاملAn Effective Branch-and-Bound Algorithm to Solve the k-Longest Common Subsequence Problem
In this paper, we study the Longest Common Subsequence problem of multiple sequences. Because the problem is NPhard, we devise an effective Branch-and-Bound algorithm to solve the problem. Results of extensive computational experiments show our method to be effective not only on randomly generated benchmark instances, but also on real-world protein sequence instances.
متن کاملHardness of Longest Common Subsequence for Sequences with Bounded Run-Lengths
The longest common subsequence (LCS) problem is a classic and well-studied problem in computer science with extensive applications in diverse areas ranging from spelling error corrections to molecular biology. This paper focuses on LCS for fixed alphabet size and fixed runlengths (i.e., maximum number of consecutive occurrences of the same symbol). We show that LCS is NP-complete even when rest...
متن کاملSimilarity Search on Uncertain Spatio-temporal Data
In this work, we address the problem of similarity search in a database of uncertain spatio-temporal objects. Each object is defined by a set of observations ((time,location)-tuples) and a Markov chain which describes the objects uncertain motion in space and time. To model similarity which is an important building block for many applications such as identifying frequent motion patterns or traj...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998